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Amendments to the Claims; 

This listing of claims will replace all prior versions, and listings, of claims in the application: 



Listing of Claims: 

1 1 . (Currently Amended) A computer implemented method of identifying 

2 and e xtracting desired content #em in HTML formatted web pages, comprising the steps of 

3 selecting a model page, wherein the model page includes content data and a 

4 plurality of HTML tags for formatting the content data ; 

5 identifying a first area of interest in the model page; 

6 parsing the model page to generate a first string of symbols corr e sponding to e ach 

7 of for the plurality of HTML tags , the generated symbols in the first string representing only 

8 HTML tags , wherein the first area of interest is identified by a first portion of the first string of 

9 symbols; 

1 0 retrieving a second web page associated with a different URL than the model 

11 page; 

12 parsing the second web page to generate a second string of symbols 

1 3 corr e sponding to e ach of th e for a plurality of HTML tags of the second web page , the generated 

14 symbols in the second string representing only HTML tags: and 

15 comparing the first and second symbol strings to determine whether the second 

16 string includes a second portion similar to the first portion of the first string, wherein the second 

17 portion corresponds to a second area of interest in the second page. 

1 2. (Original) The method of claim 1 , wherein the step of comparing 

2 includes applying an approximate pattern matching algorithm to the first and second strings. 

1 3. (Original) The method of claim 1 , further comprising the step of 

2 storing the first and second areas of interest in a database. 



Page 2 of 1 1 



Appl. No. 09/645,479 
Amdt. dated January 2 1 , 2005 
Amendment/RCE Submission 



PATENT 



1 4. (Currently amended) The method of claim 1, further comprising the step 

2 of extracting content data in the second area of interest from the second page. 

1 5. (Original) The method of claim 4, further comprising the step of 

2 applying a regular expression matching algorithm to the extracted second area of interest. 

1 6. (Original) The method of claim 1, wherein the first and second areas 

2 of interest each include two or more distinct sub-areas of the respective page. 

1 7. (Original) The method of claim 1 , wherein the step of identifying a 

2 first area of interest includes the step of identifying portions of the HTML tags of the model 

3 page. 

1 8. (Original) The method of claim 1 , wherein the step of identifying a 

2 first area of interest is performed using a manual pointing and selecting device. 

1 9. (Original) The method of claim 1 , wherein the steps of selecting and 

2 identifying are performed manually and wherein the remaining steps are performed 

3 automatically. 

1 1 0. (Original) The method of claim 1 , wherein the second web page is 

2 retrieved from a remote website over the Intemet. 

1 11. (Original) The method of claim 1 , wherein the HTML tags include 

2 attributes and attribute values. 

1 12. (Currently amended) A computer readable medium containing 

2 instructions for controlUng a computer system to automatically identify and e xtract desired 

3 content frem in a retrieved HTML formatted web page, by automatically: 

4 parsing the HTML code of a manually selected model web page to generate a first 

5 string of symbols corr e sponding to e ach of for a first pluraHty of HTML tags , the generated 

6 symbols in the first string representing only HTML tags ; 
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7 retrieving a second web page associated with a different URL than the model web 

"8 page; 

9 parsing the HTML code of the second web page to generate a second string of 

1 0 symbols corresponding to each of th e for HTML tags of the second page , the generated symbols 

1 1 in the second string representing only HTML tags ; and 

12 comparing the first and second symbol strings to determine whether the second 

1 3 page includes a second plurality of HTML tags substantially matching the first plurality of 

14 HTML tags. 

1 13. (Original) The computer readable medium of claim 1 2, wherein the 

2 first plurality of HTML tags are identified by an operator using a pointing and selection device 

3 coupled to the computer system. 

1 14. (Original) The computer readable medium of claim 12, wherein the 

2 second web page is retrieved fi"om a remote website over the Internet. 

1 15. (Original) The computer readable medium of claim 12, fiirther 

2 including instructions for extracting a portion of the second page corresponding to the second 

3 plurality of HTML tags. 

1 1 6. (Original) The computer readable medium of claim 15, wherein the 

2 instructions fiirther control the computer system to store the extracted portion of the second page 

3 in a database. 

1 1 7. (Original) The computer readable medium of claim 1 5, fiirther 

2 including instructions for controlling the computer system to apply a regular expression 

3 matching algorithm to the extracted portion of the second page. 

1 18. (Original) The computer readable medium of claim 15, wherein the 

2 extracted portion of the second page includes two or more distinct sub-areas. 
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.1 19. (Original) The computer readable medium of claim 12, wherein the 

2 instructions for comparing include instructions for applying an approximate string matching 

3 algorithm to the first and second strings. 

1 20. (Original) The computer readable medium of claim 12, wherein the 

2 HTML tags include attributes and attribute values. 

1 21 . (Currently amended) A computer system for identifying and extracting 

2 content fi-om HTML formatted web pages, the system comprising: 

3 means for retrieving web pages including content data and HTML tags for 

4 formatting the content data , wherein a model web page is retrieved; 

5 means for manually identifying a first area of interest in the model page, wherein 

6 the first area of interest corresponds to a first plurality of HTML tags; and 

7 a processor including: 

8 means for parsing a page, wherein the parsing means parses the model page and 

9 generates a first string of symbols corr e sponding to e ach of for the first plurality of HTML tags^ 

1 0 the generated symbols in the first string representing only HTML tags , and wherein the parsing 

1 1 means thereafter parses an automatically retrieved second web page associated with a different 

12 URL than the model page and generates a second string of symbols corresponding to o uch of th e 

1 3 for HTML tags of the second web page , the generated symbols in the second string representing 

14 only HTML tags : 

1 5 means for comparing the first and second symbol strings to determine whether the 

16 second string includes a second portion similar to the first portion of the first string, wherein the 

1 7 second portion corresponds to a second area of interest in the second page; and 

1 8 means for extracting content data in the second area of interest fi-om the second 

19 page. 
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J 22. (Currently amended) A computer implemented method of identifying 

2 and e xtracting desired content #em in web pages formatted using a markup language, 

3 comprising the steps of: 

4 selecting a model page, wherein the model page includes a plurality of tokens,, 

5 wherein tokens include HTML tag elements and content elements : 

6 identifying a first area of interest in the model page; 

7 parsing the model page to generate a first string of symbols corr e sponding to e ach 

8 ef for the plurality of tokens in the model page, the generated symbols in the first string 

9 representing only tag elements , wherein the first area of interest is identified by a first portion of 

1 0 the first string of symbols; 

1 1 retrieving a second web page associated with a different URL than the model 

12 page; 

13 parsing the second web page to generate a second string of symbols 

14 corr e sponding to e ach of th e for a plurality of tokens of the second web page , the generated 

15 symbols in the second string representing only tag elements : and 

16 comparing the first and second symbol strings to determine whether the second 

1 7 string includes a second portion similar to the first portion of the first string, wherein the second 

1 8 portion corresponds to a second area of interest in the second page. 

1 23. (Currently amended) The method of claim 22, further comprising the step 

2 of extracting content elements in the second area of interest from the second page. 

1 24. (Original) The method of claim 22, wherein the markup language is 

2 selected from the group consisting of HTML, XML, WML, DHTML and HDML. 

1 25. (Canceled). 

1 26. (Currently amended) A computer-implemented method of identifying 

2 similar content in HTML formatted web pages, the method comprising: 
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3 selecting a model page, wherein the model page includes content data and a 

* 4 plurality of HTML tags for formatting the content data ; 

5 identifying a first area of interest in the model page; 

6 generating a first string of symbols for the plurality of HTML tags associated with 

7 the first area of interest , the generated symbols in the first string representing only HTML tags: 

8 each Q>TTibol corr e sponding to a different one of th e plurality of HTML tags; 

9 retrieving a second web page associated with a different URL than the model 

10 page; 

1 1 generating a second string of symbols for the HTML tags of the second web page^ 

12 the generated symbols in the second string representing only HTML tags; e ach s e cond symbol 

13 corr e sponding to a different one of the plurality of HTML tags of the s e cond web pag e ; and 

14 comparing the first and second symbol strings to determine whether the second 

15 string includes a portion similar to the first string, wherein the portion corresponds to a second 

16 area of interest in the second page. 

1 27. (Currently amended) The method of claim 26, further comprising 

2 extracting content data in the second area of interest from the second page. 

1 28. (Previously presented) The method of claim 26, wherein identifying is 

2 performed manually using a user-input device. 
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